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Abstract 

Often times, individuals working together as a team can solve hard problems beyond the capability 
of any individual in the team. Cooperative optimization is a newly proposed general method for attacking 
hard optimization problems inspired by cooperation principles in team playing. It has an established 
theoretical foundation and has demonstrated outstanding performances in solving real-world optimization 
problems. With some general settings, a cooperative optimization algorithm has a unique equilibrium and 
converges to it with an exponential rate regardless initial conditions and insensitive to perturbations. 
It also possesses a number of global optimality conditions for identifying global optima so that it 
can terminate its search process efficiently. This paper offers a general description of cooperative 
optimization, addresses a number of design issues, and presents a case study to demonstrate its power. 



I. Introduction 

Optimization is a core problem both in mathematics and computer science. It is a very active 
research area with many international conferences every year, a large amount of literature, and 
many researchers and users across many fields for a wide range of applications. Combinatorial 
optimization [1], [2] is a branch of optimization where the set of feasible solutions of problems 
is discrete, countable, and of a finite size. The general methods for combinatorial optimization 
are 1) local search [3], 2) simulated annealing [4], [5], 3) genetic algorithms [6], [7], [8], 5) 
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ant colony optimization [9], 4) tabu search [10], 5) branch-and-bound [11], [12] 6) dynamic 
programming [12]. The successful applications of different combinatorial optimization methods 
have been reported in solving a large variety of optimization problems in practice. 

Optimization is important in the areas of computer vision, pattern recognition, and image 
processing. For example, stereo matching is one of the most active research problems in computer 
vision [13], [14], [15], [16]. The goal of stereo matching is to recover the depth image of a scene 
from a pair of 2-D images of the same scene taken from two different locations. Like many 
other problems from these areas, it can be formulated as a combinatorial optimization problem, 
which is NP-hard [17] in computational complexity in general. 

The researchers in computer vision have developed a number of search techniques which have 
been proven effective in practice for finding good solutions for combinatorial optimization prob- 
lems. Two well-known ones are the cooperative algorithm proposed by D. Marr and T. Poggio 
in [16] for stereo matching and the probabilistic relaxation proposed by A. Rosenfield et al [18] 
for scene labeling. 

Recently, there are some remarkable progresses in discovering new optimization methods for 
solving computer vision problems. Graph cuts [14], [19], [13], [20] is a powerful specialized 
optimization technique popular in computer vision. It has the best known results in energy 
minimization in the two recent evaluations of stereo algorithms [13], [21], more powerful than 
the classic simulated annealing method. However, graph cuts has a limitation in its scope because 
it is only applicable when the energy minimization of a vision problem can be reduced into a 
problem of finding the minimum cut in a graph [20]. 

The second optimization method is so called the sum-product algorithm [22], a generalized 
belief propagation algorithm developed in AI [23]. The sum-product algorithm is the most pow- 
erful optimization method ever found so far for attacking hard optimization problems raised from 
channel decoding in communications. The min-sum algorithm and max-product algorithm [24], 
[25] are its variations. It has also been successful applied to solve several computer vision 
problems with promising experimental results [26]. 

The third method proposed recently is so called max-product tree-reweighted message pass- 
ing [27]. It is based on a lower bounding technique called linear programming relaxation. Its 
improvement has been proposed recently and its successful applications in computer vision have 
been reported [28]. 
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The cooperative optimization is a newly discovered general optimization method for attacking 
hard optimization problems [29], [30], [31], [32]. It has been found in the experiments [33], [34], 
[35], [36], [37] that cooperative optimization has achieved remarkable performances at solving 
a number of real-world NP-hard problems with the number of variables ranging from thousands 
to hundreds of thousands. The problems span several areas, proving its generality and power. 

For example, cooperative optimization algorithms have been proposed for DNA image analy- 
sis [33], shape from shading [32], stereo matching [30], [34], and image segmentation [38]. In 
the second case, it significantly outperformed the classic simulated annealing in finding global 
optimal solutions. In the third case, its performance is comparable with graph cuts in terms of 
solution quality, and is twice as faster as graph cuts in software simulation using the common 
evaluation framework for stereo matching [13]. In the fourth case, it is ten times faster than graph 
cuts and has reduced the error rate by two to three factors. In all these cases, its memory usage 
is efficient and fixed, its operations are simple, regular, and fully scalable. All these features 
make it suitable for parallel hardware implementations. 

This paper is organized in three major themes as 1) a formal presentation for cooperative 
optimization, 2) design issues, and 3) a case study. They are the generalization and the extension 
of the previous papers on cooperative optimization. In the case study, another cooperative 
optimization algorithm for stereo matching besides the one proposed before [30], [34] is offered 
to demonstrate the power and flexibility of cooperative optimization. Compared with the previous 
one for stereo matching, the new one lowers the energy levels of solutions further and is more 
than ten times faster. Just like the previous one, the new one is also simple in computation and 
fully parallel in operations, suitable for hardware implementations. 

II. Cooperative Multi-Agent System for Distributed Optimization 

Different forms of cooperative optimization can be derived from different cooperation schemes. 
The basic form defines an important collection of cooperative optimization algorithms. There are 
two different ways to derive it; namely, 1) as a cooperative multi-agent system for distributed 
optimization and 2) as a lower bounding technique for finding global optimums. Each way 
offers its own inspirations and insights to understand the algorithms. This section describes the 
first way. The following section offers the description for the second way. Readers who are 
not interested in them can directly jump to Section |V] for a general description of cooperative 



4 



optimization. Those three sections are relatively independent to each other. 

A. Inspiration and Basic Ideas 

Team playing is a common social behavior among individuals of the same species (or different) 
where the team members working together can achieve goals or solve hard problems which 
are beyond the capability of any member in the team. Often times, team playing is achieved 
through competition and cooperation among the members in a team. Usually, competition or 
cooperation alone can hardly lead to good solutions either for a team or for the individuals 
in the team. Without competition, individuals in a team may lose motivation to pursue better 
solutions. Without cooperation, they might directly conflict with each other and poor solutions 
might be reached both for the team and themselves. Through properly balanced competition and 
cooperation, individuals in a team can find the best solutions for the team and possibly good 
solutions for themselves at the same time. 

In the terms of computer science, we can view a team of this kind as a cooperative system 
with multiple agents. In the system, each agent has its own objective. The collection of all 
the agent's objectives form the objective of the system. We can use a cooperative system to 
solve a hard optimization problem following the divide- and-conquer principle. We first break up 
the objective function of the optimization problem into a number of sub-objective functions of 
manageable sizes and complexities. Following that, we assign each sub-objective function to an 
agent in a system as the agent's own objective function and ask those agents in the system to 
optimize their own objective functions through competition and cooperation. (Throughout this 
paper, we use the term "objective" and "objective function" interchangeably since the objective 
of an optimization problem is defined by an objective function and this paper focuses only on 
optimizing objective functions.) 

Specifically, the competition is achieved by asking each agent to optimize its own objective 
function by applying problem-specific optimization methods or heuristics. However, the objec- 
tives of agents may not be always aligned with each other. In other words, the best solutions 
of the agents for optimizing their own objective functions may conflict with each other. To 
resolve the conflicts, each agent passes its solution to its neighbors through local message 
passing. After receiving its neighbor's solutions, each agent compromises its solution with the 
solutions of its neighbors. The solution compromising is achieved by modifying the objective 
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Initialization 

For each individual i in the system, find the initial solution, 

Find_Solution{Objective{i)) =>- Solution(i,t = 0); 
Iteration 

For each individual i in the system, 

Modify its original objective by including its neighbors' solutions, 

Modify_Objective(Objective(i), {Solution(j,t)\j G Neighbor s(i)}) 
=>- Objective(i, t + 1); 
Find solutions of the modified objective, 

FindJSolution(Objective(i,t + 1)) =>- Solution(i,t + 1); 

Fig. 1. Cooperative Multi-Agent System for Distributed Optimization. 

function of each agent to take into account its neighbors' solutions. It is important to note that 
solution compromising among agents is a key concept for understanding the cooperation strategy 
introduced by cooperative optimization. 

Let the objective of the individual i be Objective(i). Let the solution of the individual i at 
time t be Solution(i,t). Let the collection of solutions of the neighbors of the individual i at 
time t be {Solution(j,t)\j G Neighbor s(i)}. The basic operations of a cooperative system are 
organized as a process shown in Figure [TJ 

The process of a cooperative system of this kind is iterative and self-organized and each agent 
in the system is autonomous. The system is also inherently distributed and parallel, making the 
entire system highly scalable and less vulnerable to perturbations and disruptions on individuals 
than a centralized system. Despite of its simplicity, it has many interesting emerging behaviors 
and can attack many challenging optimization problems. 

B. Basic Form of Cooperative Optimization 

In light of the cooperative multi-agent system for distributed optimization described in Fig. [Q 
we can derive the basic form of cooperative optimization now. It is based on a direct way for 
defining the solution of each agent and a simple way to modify the objective of each agent. 
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The derivation can be generalized further in a straightforward way to any other definitions of 
solutions and modifications of objectives. 

Given a multivariate objective function E(x\,X2, ■ ■ ■ ,x n ) of n variables, or simply denoted 
as E(x), where each variable Xi is of a finite domain Di of size \Di\. Assume that E(x) can be 
decomposed into n sub-objective functions Ei(x), denoted as {E^x)}, satisfying 

1) E(x) = E x (x) + E 2 (x) + ... + E n (x) , 

2) Ei(x), for % — 1, 2, . . . , n, contains at least variable Xi, 

3) the minimization of Ei(x), for i = 1, 2, . . . , n, is computationally manageable in complex- 
ity. 

Let us assign E^x) as the objective of agent i, 

Objective(i) = Ei(x), for i = 1, 2, . . . , n . 

There are n agents in the system, one agent for each sub-objective function. 

Let the initial solution of agent i be the minimization result of E^x) defined as follows, 

Solution(i,t = 0) = min Ei(x) , 

Xi\xi 

where Xi is the set of variables contained in Ei(x), and mm Xi \ Xi stands for minimizing with 
respect to all variables in Xi excluding Xj. The solution is an unary function on variable Xi, 
denoted as ^f'(xi). 

Assume that the system takes discrete-time with iteration step k = 1,2,3,.... To simplify 
notations, let E\ k \x) be the modified objective function of agent i at iteration k, i.e., 

E\ k \x) = Objective(i,t = k) . 

It is also referred to as the i-th modified sub-objective of the system. The agent's solution at the 
iteration is defined as 

Solution(i,t = k) = min E\ k \x) . (1) 

Xi\xi 

The solution is an unary function on variable Xi, denoted as ^\ k \xi). It is the state of agent i 
at iteration k. It can be represented as a vector of real values of size |D$|, the domain size of 
variable Xj. The i-th equation in (OQ) defines the dynamics of agent i. All the n equations define 
the dynamics of the system. 
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As described in the previous subsection, the cooperation among the agents in the system 
is introduced by solution compromising via modifying the objective of each agent. Let agent i 
define its modified objective function (x) at iteration A; as a linear combination of its original 
objective E^x) and the solutions of its neighbors at the previous iteration k — 1 as follows, 

Ef \x) = (1 - A fc ) Ei(x) + X k Yl Wi^f- l \ Xj ) , (2) 

j £iV eighbor s(i) 

where Afc and are coefficients of the linear combination. 

Agent j is the neighbor of agent % if variable Xj of the same index j is contained in the agent i's 
objective function Ei{x). (Based on this definition, the agent i is also a neighbor of itself. Such 
a generalization is necessary because there is no restriction to have agent i modify its objective 
using its own solution.) The neighbors of agent i is denoted as M(i), i.e., Af(i) = Neighbors(i). 
Specifically, it is defined as the set of indices as 

M(i) = { ] \{x J }eX l } . 

Substituting Eq. © into Eq. (OQ) and letting Wij — if j ^ the dynamics of the 

cooperative system can be written as the following n difference equations, 

^f\xi) = min ^(1 - A fe ) Ei(x) + A fe ^ Wij *f j , for i = 1, 2, . . . , n . (3) 

Such a set of difference equations defines a basic cooperative optimization system (algorithm) 
for minimizing an objective function of the form £V Ei{x). 

At iteration k, variable Xj, for i = 1, 2, . . . , n, has a value in the solution for minimizing the 
i-th modified sub-objective function E\ k \x). It is denoted as x[ k \ i.e., 

x I = arg mm mm E\ (x) . 

Xi Xi\xi 

From (OQ), we have 

x^ = argmin^ fe ^(xi) . (4) 

The agent i is responsible for assigning that value to variable Xj. The assignments of other 
variables are taken care of by other agents. All these values together form a solution of the 
system at iteration k, denoted as x^ k \ 

Putting everything together, we have the pseudo code of the algorithm is given in Figure [2l 
The global optimality condition mentioned in the line 7 will be discussed in detail later in this 
paper. 
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Procedure Basic Cooperative Optimization Algorithm 

1 Initialize the soft assignment function tyf'(xi), for each i; 



2 for k :— 1 to max -iteration do 
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for each i do 



/* modify the i-th sub-objective function Ei(x) */ 



4 



Ef\x) := (I - X^E^x) + X^w^t^ixi) ; 



/* minimize the modified sub-objective function */ 



5 



/* find the best value for Xj */ 
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£j ; := argmin^ ^ '(a;*); 
if is a global optimal solution return x^; 



8 return a;( fc ); /* as an approximate solution */ 

Fig. 2. Basic cooperative optimization algorithm for minimizing an objective function of the form E(x) — 5^™=i Ei{ x )- 

C. Cooperation Strength and Propagation Matrix 

The coefficient in ([3]) controls the level of the cooperation among the agents at iteration k. 
It is so called the cooperation strength, satisfying < Afe < 1. From © we can see that, for each 
agent, a high value for \ k will weigh the solutions of the other agents more than its own objective 
Ei{x). In other words, the agents in the system tend to compromise more with their solutions. As 
a consequence, a strong level of cooperation is reached in this case. If the cooperation strength 
X k is of a small value, the cooperation among the agents is weak. Particularly, if it is equal 
to zero, there is no cooperation among the agents and each agent minimizes its own objective 
function independently (see ©)■ 

The coefficients iOy control the propagation of solutions {xj), for j = 1,2, ...,n, 

as messages among the agents in the system. All WijS together form a n x n matrix called 
the propagation matrix. To have ^2iEi(x) as the objective function to be minimized, it is 
required [33] that the propagation matrix W = (Wij) nXn is non-negative and 



n 




for j = 1,2, ...,n . 



i=i 
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To have solutions \fQ \ x j) uniformly propagated among all the agents, it is required [33] 



that the propagation matrix W is irreducible. A matrix W is called reducible if there exists a 
permutation matrix P such that PWP T has the block form 



The role of propagation matrices in basic cooperative optimization algorithms is exactly same 
as the one of transition matrices in Markov chains (or random walks over directed graphs). In a 
Markov chain, a transition matrix governs the distribution of states over time. In a basic cooper- 
ative optimization algorithm, a propagation matrix governs the distribution of solutions among 
agents. The mathematical foundation for analyzing Markov chains has been well established. 
They can be directly applied to analyze the message propagation of cooperative optimization. 

D. Soft Decisions as Messages Passed Among Agents 

As mentioned before, the solution W k '(xi) of agent i at iteration k is an unary function on 
Xi storing the solution of minimizing the i-th modified sub-objective function e\ (x) (see ©)• 
Given a value of x^ fy[ k \xi) is the minimal value of E^ k \x) with the variable fixed to that 
value. To minimize E\ k \x), the values of Xi which have smaller function values ^ k \xi) are 
preferred more than those of higher function values. The best value for assigning the variable 
Xi is the one of the minimal function value ^\ k \xi) (see ©). Therefore, ^ k \xi) is inversely 
related to the preferences over different values of Xi for minimizing Ef\x). It is so called the 
assignment constraint on variable Xi, an algorithm introduced constraint on the variable. It can 
also be viewed as a soft decision made by the agent for assigning the variable Xi at iteration k. 

In particular, a soft decision of agent i falls back to a hard decision for assigning the variable 
Xi when the agent accept only one value and reject all the rest values. Such a hard decision can 
be represented by the assignment constraint ^\ ; ( ^f\xi) = 0, for some Xi G Di, and 

ty\ k '(xi) = oo for any Xi ^ x^. 

With that insight, it can be understood now that the messages propagated around among the 
agents in a basic cooperative optimization system are the soft decisions for assigning variables. 
An agent can make a better decision using soft decisions propagated from its neighbors than 
using the hard ones instead. It is important to note that soft decision making is a critical feature of 
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cooperative optimization, which makes it fundamentally different from many classic optimization 
methods where hard decisions are made for assigning variables. 

E. A Simple Example 

Given an objective function of the following form 



E(x 1 ,X 2 ,X 3 ,X 4 ,X 5 ) = 

fiM + f 2 (x 2 ) + f 3 (x 3 ) + U{x A ) + f 5 (x 5 ) + 

f lj2 (x 1: X 2 ) + f 2 , 3 (x 2 ,X 3 ) + f 3 ,4(x 3 ,X 4 ) + 



where each variable is of a finite domain. The goal is to seek values (labels) of the five variables 
such that the objective function is minimized. 
Let us simply denote the function as 



To design a basic cooperative optimization algorithm to minimize the objective function, we 
first decompose it into the following five sub-objective functions, 



E 1 (x 1 ,x 2 ,x 5 ) = h + /i, 2 /2 + /i, 5 /2; 

E 2 { Xl ,x 2 ,x 3 ,x 5 ) = / 2 + /i,2/2 + / 2 ,3/2 + /2,5/2; 

E 3 {x 2 ,x 3 ,x±) = f 3 + / 2)3 /2 + / 3)4 /2; 

^4(^3, x 4 , x 5 ) = f 4 + f 3,J2 + / 4)5 /2; 

E 5 (X!,X 2 ,X4,X 5 ) = / 5 + /l,5/2 + /2,5/2 + /4,5/2. 



A propagation matrix W of dimensions 5x5 can be chosen as 



/4.50&4, x 5 ) + fi, 5 (xi,x 5 ) + f 2i5 (x 2 , x 5 ) , 



(5) 



E(x) = fi + f 2 + h + U + h+ 



/l,2 + / 2 ,3 + h,A + /4,5 + /l,5 + /2,5 • 



/ I I \ 



= 




(6) 
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With the decomposition and the propagation matrix, substituting them into © we have a 
basic cooperative optimization algorithm with five difference equations for minimizing the five 
sub-objective functions in an iterative and cooperative way. 

F. Basic Canonical Form as Generalization 

Replacing ^\ k \x) by (1 — Xk)^[ k \x) in the difference equations ©, we have the basic 
canonical form of cooperative optimization as 



The basic form of cooperative optimization © has its cooperation strength A& restricted to 
< Afc < 1. It is because its difference equations © do not make sense when A& > 1. However, 
such a restriction can be relaxed to < A& for the basic canonical form ©. Often in times 
in practice, the basic canonical form is preferred over the basic one because the cooperation 
strength A& in the former has a broader range to choose from to maximize performance. 



A. Bound Function Tightening Technique for Optimization 

In principle, a basic cooperative optimization algorithm can be understood as a lower bounding 
technique for finding global minima. It first initializes a function of some form as a lower bound 
function to an objective function. One may intentionally choose a form for the lower bound 
function such that the minimization of the function is simple in computation. Following that, 
the algorithm progressively tightens the lower bound function until its global minimum touches 
the global minimum of the original objective function. The latter is then found by searching the 
former instead (see the illustration in Fig. [3]). 

Specifically, let the objective function to be minimized be E(x). Assume that the initial 
lower bound function be E^\x), E^\x) < E(x). From E_(x), assume that the algorithm 
progressively tightens the function in an iterative way such that 




for i — 1, 2, . . . , 



n . 



(7) 



III. Cooperative Optimization as Lower Bounding Technique 



E^(x)<E W (x)<...<E ik) 



(x) < E(x) , 



where k is the iteration number. 
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— \ The global minima of the 
lower bound functions 



Fig. 3. The global minimum of a complex multivariate objective function can be found by progressively tightening a lower 
bound function of some simple form until its global minimum touches the one of the original objective function. 

Let the global minimum of the lower bound function E_(x) at iteration k be x^ k \ Finding 
x^ is simple in computation due to the simple form of the lower bound function E^\x). At 
iteration k, if the algorithm found that the lower bound function E_ (x) at the solution x^ has 
the same function value as the original objective function E(x), i.e., 

E W( x W)=E(xW) . 

In other words, the two functions touch each other at the point where x = x^ k ' in the search 
space. Then x^ must also be the global minimum of E(x) simply because 

E(x {k) ) = £i fc) (x {fc) ) < E^ix) <E(x), for any x . (8) 

Such a condition implies that the lower bound function E^ (x) has been tightened enough such 
that its global minimum x^ touches the global minimum of the original objective function 
E(x). The latter is thus found by searching the former instead. 

Such a lower bounding technique is so called the bound function tightening technique for 
optimization. There are other lower bounding techniques based on principles different from this 
one. Examples are Lagrangian relaxation techniques, cutting plane techniques, branch-and-bound 
algorithms, and branch-and-cut algorithms. 
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B. Basic Form as Lower Bounding Technique 

In light of the bound function tightening technique described in the previous subsection, we 
can derive the basic form of cooperative optimization based on a simple form of lower bound 
functions. The derivation can be generalized further in a straightforward way to any other forms 
of lower bound functions. 

Given an objective function of n variables, E(x 1 , x 2 , ■ ■ ■ , x n ), or simply denoted as E(x). 
Assume that E_(x) is a lower bound function of E(x) defined on the same set of variables. 
Obviously the linear combination of the two functions, 

(1 - X)E(x) + \E_(x) , (9) 

defines a new lower bound function of E(x) if the parameter A satisfying < A < 1. 
Let us choose a simple form for the lower bound function as 

E.(x) = + ^ 2 (x 2 ) + . . . + V n {x n ) , (10) 

where ^(x;) is an unary component function defined on variable x^, for i = 1,2, ... ,n. Its global 
minimum, denoted as x, can be easily found by minimizing the unary component functions ^(xj) 
independently as 

Xi — axgmin^j^j), for i — 1, 2, . . . , n . 
Assume that the objective function E{x) can be decomposed into n sub-objective functions, 

E{x) = E x {x) + E 2 {x) + ... + E n (x) . 
The lower bound function (x) can also be easily decomposed into n sub-functions as follows 

n 

E-(x) = ^^Wij^ j{xj), where 
i=i 

Wij > and ^^Wjj = 1, for 1 < i,j < n . 
Based on the two decompositions, the new lower bound function © can be rewritten as 

J2 (l-A^+A^^-fe) ) . (11) 



i=i 



To put the above function in a simple form, let 

Ei(x) = (1 - \)Ei{x) + X^WijVjixj) 
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Then it can be rewritten simply as 

n 
i=l 

In the above sum, let X t be the set of variables contained in the i-th component function Ei(x). 
If we minimize the function with respect to all variables in Xi except for x^ we obtain an unary 
function defined on x; t , denoted as ^(2^), i.e., 

ty'iixi) = min Ei(x) . (12) 

Xi\xi 

The sum of those unary functions defines another lower bound function of E(x), denoted as 
E'_(x), i.e., 

n 

E'_{x) = J2*'i(xi)< E (x) ■ 

i=l 

This new lower bound function has exactly the same form as the original one E_ (x) = ^ ^i(xj). 
Therefore, from a lower bound function E-(x) of the form ^ ^i(xi), we can compute another 
lower bound function E_ (x) of the same form. Such a process can be repeated and we can have 
an iterative algorithm to compute new lower bound functions. 
Rewriting Eq. (fl2|) in an iterative format, we have 

min ( (1 - A fe ) Ei(x) + \kJ2 w iM k ~ 1) fa) ) > ( 13 ) 

Xi\x t \ j J 

where k is the iteration step, k = 1,2,3, .. .. The above n difference equations define a basic 
cooperative optimization algorithm for minimizing an objective function E(x) of the form 

The solution at iteration k, denoted as x^, is defined as the global minimal solution of the 
lower bound function E ( l \x) at the iteration, i.e., 

x^ = argmin£'i fc ^(x), 

which can be easily obtained as 

cc^ = argmin ^\ h \xi), for i — 1, 2, . . . , n . (14) 

If E^l\x^) = E(x^) at some iteration k, then the solution x^ must be the global minimum 
of the original objective function E(x). 
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Without loss of generality, we assume in the following discussions that all sub-objective 
functions Ei(x) are nonnegative ones. One may choose the initial condition as (a^) = 0, for 
any value of Xi and i — 1, 2, . . . , n. The parameter A& can be varied from one iteration to another 
iteration. If it is of a constant value and the above initial condition has been chosen, cooperative 
optimization theory [33] tells us that the lower bound function E^ k \x) is monotonically non- 
decreasing as shown in ([8]). 

IV. Computational Properties 
A. General Convergence Properties of Cooperative Optimization 

It has been shown that a basic cooperative optimization algorithm © has some important 
computational properties [33]. Given a constant cooperation strength A, i.e., = A for all ks, 
the algorithm has one and only one equilibrium. It always converges to the unique equilibrium 
with an exponential rate regardless of initial conditions and perturbations. The two convergence 
theorems proved in [33] are very important and so they are listed here again. One formally 
describes the existence and the uniqueness of the equilibrium of the algorithm, and the another 
reveals the convergence property of the algorithm. 

Theorem 4.1: A basic cooperative optimization algorithm with a constant cooperation strength 
A (0 < A < 1) has one and only one equilibrium. That is, the difference equations © of the algo- 
rithm have one and only one solution (equilibrium), denoted as a vector (\E f i°°' ) , ^ > • • • > ) T > 
or simply \ff(°°\ 

Theorem 4.2: A basic cooperative optimization algorithm with a constant cooperation strength 
A (0 < A < 1) converges exponentially to its unique equilibrium with the rate A with any 
choice of the initial condition ty(°>. That is, 

H^(fc) _ ^My^ < A fc ||^(°) - tf^Hoo . (15) 
where ||a;||oo is the maximum norm of the vector x defined as 

j | ^ 1 1 oo — m.£tx I x ^ I . 

i 

The two theorems indicate that every basic cooperative optimization algorithm © is stable 
and has a unique attractor, vl/^ 00 ). Hence, the evolution of the algorithms is robust, insensitive to 
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perturbations. The final solution of the algorithms is independent of their initial conditions. In 
contrast, the conventional algorithms based on iterative local improvement of solutions may have 
many local attractors due to the local minima problem. The evolution of those local optimization 
algorithms are sensitive to perturbations, and the final solution of those algorithms is dependent 
on their initial conditions. 

Furthermore, the basic cooperative optimization algorithms © possess a number of global 
optimality conditions for identifying global optima. They know whether a solution they found 
is a global optimum so that they can terminate their search process efficiently. However, this 
statement does not imply that NP=P because a basic cooperative optimization algorithm can 
only verify within a polynomial time whether a solution it found is a global optimum or not. It 
cannot decide the global optimality for any given solution other than those it found. 

It is important to note that a basic canonical cooperative optimization algorithm © may no 
longer possess the unique equilibrium property when its cooperation strengths at some iterations 
are greater than one, i.e., > 1 for some ks. In this case, the algorithm may have multiple 
equilibriums. It can evolve into any one of them depending on its initial settings of the assignment 
constraints ^f\xi) (1 < i < n). 

B. Consensus Solution and Solution Consensus in Distributed Optimization 

As described before, a basic cooperative optimization algorithm is defined by the n difference 
equations ©. The z-th equation defines the minimization of the 2-th modified sub-objective 
function E\ k \x) (defined in ©). Given any variable, say x if it may be contained in several 
modified sub-objective functions. At each iteration, x.- t has a value in the optimal solution for 
minimizing each of the modified sub-objective functions containing the variable. Those values 
may not be the same. If all of them are of the same value at some iteration, we say that the 
cooperative optimization algorithm reach a consensus assignment for that variable. Moreover, if 
a consensus assignment is reached for every variable of the problem at hand at some iteration, 
we call the minimization of the n modified sub-objective functions reaches a solution consensus. 
That is, there is no conflict among the solutions in terms of variable assignments for minimizing 
those functions. In this case, those consensus assignments form a solution, called a consensus 
solution, and the algorithm is called reaching a consensus solution. 

To be more specific, given n modified sub-objective functions, Ei{x), for i = 1, 2, . . . , n (to 
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simplify notation, let us drop the superscript k temporarily). Let the optimal solution of the i-th 
modified sub-objective function be x(Ei), i.e., 

x(Ei) = arg minima;) . 

X 

Assume that variable x { is contained in both j-th and A;-th modified sub-objective functions 
Ej(x), E k (x). However, it is not necessary that 

Xi(Ej) = Xi{E k ) . 

Given a variable x^ if the above equality holds for any j and k where Ej(x) and E k (x) contain 
Xi, then a consensus assignment is reached for that variable with the assignment value denoted 
as Xi. Moreover, if the above statement is true for any variable, we call the minimization for all 
Ei(x)s reaches a solution consensus. The solution x with Xi as the value of variable X; t is called 
a consensus solution. 

As defined before, X stands for the set of variables contained in the function Ei(x). X is 
a subset of variables, i.e., IjCI = {x 1: x 2: ■ ■ ■ ,x n }. Let x(Xi) stand for the restriction of a 
solution x on Another way to recognize a consensus solution x is to check if x(Xi), for any 
i, is the global minimum of Ei(x), i.e., 

x(Xi) = argminEj(x), for any % . 

X 

Simply put, a solution is a consensus one if it is the global minimum of every modified 
sub-objective function. 

C. Consensus Solution in Cooperative Optimization 

Consensus solution is an important concept of cooperative optimization. If a consensus solution 
is found at some iteration or iterations, then we can find out the closeness between the consensus 
solution and the global optimal solution in cost. The following theorem from [33] makes these 
points clearer. 

Theorem 4.3: Let 

n 

E *W = J- tff) (xf ] ) , where xf ] = arg min flf > (x t ) . 

i=l 

Given any propagation matrix W, and the general initial condition fyf\xi) = 0, for each i, or 
Ai = 0. If a consensus solution x is found at iteration k\ and remains the same from iteration 
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ki to iteration k 2 , then the closeness between the cost of x, E(x), and the optimal cost, E*, 
satisfies the following two inequalities, 

< E(x) - E* < ( J] A fc ) (E(x) - El^) , (16) 

\k=k 1 J 

< E{x) - E* < ^ k=k k l Xk (E* - E*_ (kl - 1] ) , (17) 

1 - rifcifci 

where (E* — E*} kl ^) is the difference between the optimal cost E* and the lower bound on 
the optimal cost E*^ 1-1 ' obtained at iteration k x — 1. 

In particular, if 1 — \ k > e > 0, for k\<k < k 2 , when k 2 — ki — ► oo, 

E{x) -> E* . 

That is, the consensus solution x must be global minimum of E(x), i.e., a; = x*. 

Consensus solution is also an important concept of cooperative optimization for defining global 
optimality conditions. The cooperative optimization theory tells us that a consensus solution can 
be the global minimal solution. As mentioned in the previous subsection that a basic cooperative 
optimization algorithm has one and only one equilibrium given a constant cooperation strength. If 
a cooperative optimization algorithm reaches an equilibrium after some number of iterations and 
a consensus solution is found at the same time, then the consensus solution must be the global 
minimal solution, guaranteed by theory. The following theorem (with its proof in the appendix) 
establishes the connection between a consensus solution and a global optimal solution. 

Theorem 4.4: Assume that a basic cooperative optimization © reaches its equilibrium at 
some iteration, denoted as That is, vE^ 00 ) is a solution to the difference equations ©. If 

a consensus solution x is found at the same iteration, then it must be the global minimum of 
E(x), i.e.,x = x*. 

Besides the basic global optimality condition given in the above theorem, a few more ones 
are offered in [33] for identifying global optimal solutions. The capability of recognizing global 
optimums is a critical property for any optimization algorithm. Without any global optimality 
condition, it will be hard for an optimization algorithm to know where to find global optimal 
solutions and whether a solution it found is a global optimum. Finding ways of identifying global 
optimums for any optimization algorithm is of both practical interests as well as theoretical 
importance. 
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D. Further Generalization of Convergence Properties 

The convergence theorem 14.31 can be generalized further to any initial conditions for 
and Ai, and to any cooperation strength series {Afc}fc>i. Dropping the restriction on the initial 
conditions and Ai in the theorem, from the difference equations ©, we have 

E* - E*_ {k2) = ( IJ A fc ] (E* - E <kl - 1] ) . (18) 
\k=k 1 J 

It is obvious from the above equation that E*} k2 ^ still approaches E* exponentially with the rate 
A when the cooperation strength \ k is of a constant value A (0 < A < 1). 

When the cooperation strength Afc is not of a constant value A, the convergence to the global 
optimum is still guaranteed as long as the cooperation strength series {1 — Afc}fc>i is divergent. 

Lemma 4.1 (Infinite Products): Let {\k}k>i be a sequence of numbers of the interval [0, 1). 

1) If C° =1 (l-Afc)<oo, then 

n 

lim TT A fc > . 

n— >oo -I- 

k=l 

2 ) if Er=i( i -^) = o °' then 

n 

lim TT Afc = . 

n— >oo -*- -•- 

k=l 

The proof of the lemma is offered in Appendix. 

From the above lemma and Eq. (TTBl . the convergence theorem H31 can be generalized further 
as follows. 

Theorem 4.5: Given any initial conditions, assume that a consensus solution x is found by 
a basic cooperative optimization algorithm at some iteration k and remains the same in the 
following iterations. If the series 

(1 - AO + (1 - A 2 ) + . . . + (1 - Afc) + . . . , (19) 

is divergent, then 

E(x) = E*. 

That is, the consensus solution x must be the global minimal solution x*, x = x*. 
If 1 — Afc = 1/k, for instance, the series (fT9~l) is the harmonic series, 

1 1 1 1 
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The harmonic series is divergent. Hence, with the choice of A& = 1 — 1/k, if a consensus solution 
x is found at some iteration and it remains the same in the following iterations, it must be the 
global minimal solution x*. 

If {1 — \k}k>i, as another example, is a convergent sequence of a positive limit, then J2k 1 — 
Afc is divergent. In this case, a consensus solution is also the global minimal solution. This 
statement can be generalized further to Cauchy sequences. Every convergent sequence is a 
Cauchy sequence, and every Cauchy sequence is bounded. Thus, if {1 — A&} is a Cauchy sequence 
of a positive bound, a consensus solution is the global minimal solution. 

To maximize the performance of a cooperative optimization algorithm, it is popular in the 
experiments to progressively increase the cooperation strength as the iteration of the algorithm 
proceeds. A weak cooperation level at the beginning leads to a fast convergence rate (see 
Theorem 14.21) . A strong cooperation level at a later stage of the iterations increases the chance 
of finding a consensus solution. Theorem 14.51 offers us some general guidance and justification 
for choosing a variable cooperation strength. It tells us that the increment of the cooperative 
strength should not be too fast if we want the guarantee of a consensus solution being the global 
optimal one. 

V. General Canonical Form of Cooperative Optimization 

By combining different forms of lower bound functions and different ways of decomposing 
objective functions, we can design cooperative optimization algorithms of different complexities 
and powers for attacking different optimization problems. The basic canonical form of coopera- 
tive optimization © can be generalized further in a straightforward way to the general canonical 
one as follows. 

Given a multivariate objective function E{x\,Xi, . . . ,x n ) of n variables, or simply denoted 
as E(x), where each variable is of a finite domain. Assume that E(x) can be decomposed into 
m sub-objective functions Ei(x) which may satisfy the condition 

m 

E(x)=J2 E ^ x ) ■ 

One may define another function E_(x), on the same set of variables as E(x), as the 
composition of m component functions as follows, 

m 

i=i 
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where ^i(x l ) is a component function defined on a subset of variables X[, X i C {xi, x 2 , ■ ■ ■ , x n }, 
for z = 1, 2, . . . , m. x l is the restriction of x on X[, denoted as x l = x{X i ). 

A cooperative optimization algorithm of the general canonical form is defined as minimizing 
the m sub-objective functions Ei(x) in the following iterative and cooperative way, 

tffV) = min |^(x)+A fc V«; ii *f- 1) (x J ') ] , (20) 
XAK V U J 

for i = 1,2, ...,m. In the equations, A;(= 1,2,3,...) is the iteration step; Xi is the set of 

variables contained in the functions at the right side of the z-th equation; A^ is a real value 

parameter at iteration k satisfying > 0; and (1 < i, j < m) are also real value parameters 

satisfying > 0. 

The solution at iteration k is defined as 

x^ = arg min (x) . 

X 

Moreover, x^ is called a consensus solution if it is the conditional optimum of all the m 
minimization problems defined in (|2Q|) . That is, 



] {Xi) = arg min E^x) + X k V w^m 



when x(X-) = x^ fc ^(X t ') and z = 1, 2, . . . , m. 

One may choose the parameters and Wij in such a way that they further satisfy the conditions 
of J2i w ij = 1' f° r ai l i s ' an d a U AfeS are less than one < 1). With the settings, if the algorithm 
reaches its equilibrium at some iteration and the solution of the iteration is also a consensus one, 
then it must be the global minimal solution (This global optimality condition can be proved in 
the exact same way as that of Theorem 14.41) . 

The general canonical form can be further generalized to variable propagation matrices, 
variable forms of lower bound functions, and variable ways of decomposing objective functions. 

VI. Design Issues 

A basic cooperative optimization algorithm © (or a basic canonical one ©) is uniquely 
defined by the objective function decomposition {Ei(x)}, the cooperation strength series {Afc}fc>i, 
and the propagation matrix (wij) nxn . Some general guideline for designing the cooperation 
strength series has discussed in the previous section. This section focuses on the rest two. 
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A. Objective Function Decomposition 

1) Constraint Optimization Problems: A large class of optimization problems have objective 
functions of the following form, 

n 

E{x\, x 2 , ■ ■ ■ , x n ) = ^ fi(xj) + fij{x%,Xj) . (21) 
i=l 

The function /j(xj) is an unary function on variable Xj, for z = 1,2, ... ,n. and the function 
fij(xi, Xj) is a binary function on two variables Xj, x^. To note the collection of all defined binary 
functions, the set M is used which contains non-ordered pairs of variable indices where each 
pair corresponds to a defined binary function /^(xj, x.,). 

The above optimization problems are also referred to as binary constraint optimization prob- 
lems (binary COP) in AI. The unary function fi(x) is called an unary constraint on variable Xj 
and the binary function fij(xi,Xj) is called a binary constraint on variables Xi,Xj. 

Binary constraint optimization problems are a very general formulation for many optimization 
problems arose from widely different fields. Examples are the famous traveling salesman prob- 
lems, weighted maximum satisfiability problems, quadratic variable assignment problems, stereo 
vision, image segmentation, and many more. Solving a binary constraint optimization problem 
is NP-hard in computation. 

2) Graphical Representation of Objective Functions: An objective function in form of (1211) can 
be represented with an undirected graph G = (V, E). In the graph, each variable x» is represented 
by a node, called a variable node, V = {xi, . . . , x„}; each binary constraint fij( Oj ^ 2 Ob j) IS 
represented by an undirected edge, connecting the variable nodes x« and Xj, denoted by a non- 
ordered pair of variable nodes (xj,Xj). By definition, the set E of the edges of the graph G is 
E = {(xj, xj)\(i,j) E M}. 

The simple example described in subsection III-EI is a binary constraint optimization problem. 
The objective function © of the simple example has the form of (ED). It can be represented by 
an undirected graph as shown in Figure HI 

If edge ( Ob i . Ob j ) e E, then the variable nodes Oj i - Oj j circ called neighbors to each other. In graph 
theory, they are also called adjacent to each other. Each variable node Xj can have a number of 
neighboring variable nodes. Let be the set of the indices of the neighboring variables of 
Xj. By definition, 
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Fig. 4. The graphical representation of the objective function of the simple example. 



Using the notations, we can rewrite the objective function (|2TI) as 

, %2 1 ■ ■ ■ j X n 

) = E /<(*<) + 1/2 E I • (22) 

i=l \ jeJV(i) 



Straightforward Decompositions: The expression (|22|) for an objective function of a binary 
constraint optimization problem also defines a straightforward way to decompose the energy 
function. That is, 

Ei(x) = fi(xi) + 1/2 E fij( x ii x j)i for z = l,2,...,n . (23) 

ieV(i) 

Obviously, 2^r=i^( x ) = This kind of decompositions is so called the straightforward 

decompositions. The sub-objective functions Ei{x) in the straightforward decompositions can be 
easily minimized as 



mm Ei(x) = min I fi(xi) + 1/2 \ mmf, 

X X; 1 * * X-: 



Using the graphical representation of an objective function can help us to visualize the straight- 
forward decomposition of an objective function. For example, the decomposition of the objective 
function of the simple example presented in Subsection III-EI can be viewed an instance of this 
kind of decompositions. The original objective function has a graphical representation shown 
in Figure HI Each sub-objective function Ei(x) of the decomposition can also be represented 
by a graph, which must be a subgraph of the original one. The graphical representation of the 
decomposition is illustrated in Figure \5\ In the figure we can see that the original loopy graph 
is decomposed into five loop-free subgraphs. 

In general, in a graphical representation, the straightforward decompositions given in (|2~3T) 
can be understood as decomposing a loopy graph into n loop-free subgraphs, one subgraph is 
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associated with each variable node in the original graph. In the decomposition, each subgraph 
is of a star-like structure with its associated variable node as the star center. It consists of the 
variable node, the neighbors of the node, and the edges connecting the node with its neighbors. 

4) Graph-Based Decompositions: The graphical representation of an objective function may 
contain many loops. That is the major cause of the difficulty at minimizing the objective function. 
If the graph is loop-free, there exist algorithms with linear complexity (e.g., dynamic program- 
ming) that can minimize the objective function efficiently. Therefore, if we can decompose an 
objective function with a loopy graphical representation into a number of sub-objective functions 
with loop-free graphical representations, a hard optimization problem is, thus, broken into a 
number of sub-problems of lower complexities. Cooperative optimization can then be applied 
to solve those sub-problems in a cooperative way. This kind of decompositions is called the 
graph-based decompositions. 

It is important to note that the modification given in (O for a sub-objective function does 
not change its graphical structure. In other words, every modified sub-objective function defined 
in © has the exact same graphical structure as its original one. This is because only unary 
functions, the assignment constraints ty\ k (#»), are introduced in the definition. Therefore, any 
optimization algorithm applicable to the original sub-objective functions should also be applicable 
to the modified ones. In other words, if a sub-objective function is of a tree-like structure, then 
its modified version defined by © must have the exact same tree-like structure. Both of them 
can be minimized efficiently via dynamic programming. 
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Fig. 6. Decomposing a loopy graph into a number of spanning trees, one spanning tree associated with each variable node 
(double circled). 

5) Spanning-Tree Based Decompositions: In terms of the graph based decompositions, the 
straightforward decompositions are based on the direct neighbors of each variable node. Another 
possible way of decomposing an objective function is based on the spanning trees of the graph 
representing the function. A tree is called a spanning tree of an undirected graph G if it is a 
subgraph of G and containing all vertices of G. Every finite connected graph G contains at least 
one spanning tree T. 

Given an objective function E(x) of n variables in form of (l2"TT) . Let G = (V, E) be its 
graphical representation with n variable nodes. Without loss of generality, we assume that G 
is connected (otherwise it implies that the original minimization problem can be broken into 
several independent sub-problems). For each variable node Xi of G, we can associate a spanning 
tree of G, denoted as Tj = (V,Ei) (Tj shares the same set of nodes as G). There are n such 
spanning trees in total, T±, T 2 , . . . , T n (some trees may possibly be duplicated). We also choose 
those n spanning trees in a way such that each edge of G is covered at least by one of the n 
trees. Figure [6] shows an example of decomposing the graph representing the objective function 
of the simple example into five spanning trees. 

After decomposing the graph G into n spanning trees, if we can define a sub-objective function 
Ei(x) for each spanning tree such that 

1) E(x) = J2 i E i (x); 

2) The graphical representation of Et(x) is Tj, for i — 1, 2, . . . , n. 
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Then the set {Ei(x)} is a legitimate decomposition of the original objective function E(x). This 
kind of decompositions is so called the spanning-tree based decompositions. 

Given an objective function E(x) of a binary constraint optimization problem with a graphical 
representation G and n spanning trees, each unary constraint fj(xj) of E(x) is associated with 
a variable node Xj in G covered by all the n spanning trees. Each binary constraint fjk(xj,x k ) 
of E(x) is associated with an edge (xj,x k ) in G covered at least by one of the n spanning trees. 
Assume that the edge (xj,x k ) E E is covered by m(j, k) spanning trees, where m(j, k) should 
be a positive integer. One way for defining sub-objective functions E^x) to satisfy the above 
two conditions is given as follows, 



for % = 1, 2, . . . , n. 

6) Some Properties of Spanning-Tree Based Decomposition: We can apply algebraic graph 
theory to reveal some of properties of graphs and their spanning trees. Let G = (V, E) be a 
finite simple graph of n nodes, with vertices v 1: . . . , v n . If G is a graphical representation of an 
objective function, then the vertices Vi is the variable node Xi, i.e., Vi = Xi. 

The connectivity of a graph G can be represented by the adjacency matrix A of G. The 
adjacency matrix A of G is defined as 



The adjacency matrix of an undirected graph is symmetric. 

Let Q be the adjacency matrix of G where the diagonal entries Qu are replaced by the degrees 
of vertices —deg(vi). Let Q* be the matrix obtained by removing the first row and column of Q. 
Then the number of spanning trees in G is equal to \det(Q*)\ (Kirchoff's Matrix-Tree Theorem). 
Particularly, if G is a complete graph, Q* has the determinant n n ~ 2 . That is, every complete 
graph with n vertices (n > 1) has exactly n n ~ 2 spanning trees (Theorem of Cay ley). 

7) Further Generalization of Graph-Based Decompositions: Using factor graph [22], any 
objective function containing fc-ary constraints, where k can be any integer number, can be 
represented as a graph. With the representation, the two aforementioned kinds of decompositions 
can be easily generalized further to decompose the objective function. Special decompositions 




(24) 
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can also be explored for graphs with special structures or constraints of some special properties 
to maximize the power of cooperative optimization algorithms. Another way to apply the two 
kinds of decompositions for the fc-ary constraints case is by converting the constraints of orders 
higher than two into binary constraints via variable clustering technique. 

B. The Propagation Matrix 

As mentioned in the previous subsection, the objective function of a binary constraint optimiza- 
tion problem has a graphical representation G = (V, E). For the straightforward decompositions 
described in the previous subsection, we can design a propagation matrix based on the adjacency 
of the graph G as follows 

1/gL, if (xi, Xj) e E, 
W=(w tl ) nxn ={ ' v K 3) (25) 

I 0, otherwise. 

where dj is the degree of the variable node Xj. The propagation matrix © of the simple example 
is designed in this way. 

Another way to design a propagation matrix is given as follows, 

{1/feL + 1), if (xi.Xj) 6 E or % = j, 
(26) 
0, otherwise. 

Such a matrix has all of the diagonal elements of non-zero values. 

C. Cooperative Optimization in Simple Form 

The design of cooperative optimization algorithms is not trivial even with the aforementioned 
guidelines. In the basic canonical form ©, there are n x n values for the propagation matrix 
(vJij)nxn and a series of values for the cooperation strength A^. To ease the design job for 
engineers and practitioners, the difference equations © of the basic canonical form can be 
simplified to 



min \Ei(x) +aV tfj* l \xj) | , for i = 1, 2, . . . , n , (27) 
Xi\xi y ^—^ J 



where a is the only parameter to be tuned in experiments to maximize performance. It plays 
the same role as the cooperation strength for controlling the cooperation level among the 
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agents in a system. The above set of simplified difference equations defines the simple form of 
cooperative optimization. 

The simple form is derived from the basic canonical form © by setting Wij be a positive 
constant w, for any i and j, if Xj is contained in E^x); and Wy = 0, otherwise. We also let the 
cooperation strength be of a constant value A. Let a = \w, we have © simplified to dTTl) . 

If the parameter a is of a large value, the difference equations (ITTI) of a simple cooperative 
optimization algorithm may have value overflow problems in computing the assignment con- 
straints ^^(xi). To improve its convergence property, we can offset each ty\ k \xi) by a value 
at each iteration. One choice is the minimal value of ^ k \xi). That is we offset ^^(xi) by its 
minimal value as follows, 

*l k \xi) := *\ k) {xi) - mintff \x l ) . 

Thus, the offsetting defines an operator on ^[ k \xi), denoted as 0{^>f\xi)). With the notation, 
the difference equations of a simple cooperative optimization algorithm become 

^>t\x l ) = O { min [Ei(x) + aJ2 ! 1 ■ for ' 1 ■ - " • <^8) 



Xi\xi 




VII. A Case Study in Computer Vision 

Just like many other problems in computer vision and image processing, stereo matching can 
be formulated as a binary constraint optimization problem with an objective function E(x) in 
form of d2U). For detail about the energy function definitions used for stereo matching, please 
see [13]. Basically, an unary constraint fi(x{) in (f2T|) measures the difference of the intensities 
between site i from one image and its corresponding site in another image given the depth of 
the site. A binary constraint fij(xi,Xj) measures the difference of the depths between site i and 
site j. This type of constraints is also referred to as the smoothness constraint in literature. It 
has also been widely used in solving image segmentation and other vision tasks. 

In our experiments, we apply the simplified form of cooperative optimization (1281) for stereo 
matching with the parameter a is set to 0.16. The maximum number of iterations is set to 16. The 
objective function associated with stereo-matching is decomposed based on the spanning-tree 
based decomposition. The detail of the decomposition and the minimization of the sub-objective 
functions are offered in the following subsection. 
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A. Decomposing Grid-like Graphs 

Often times, the graphical representation of the objective function of an image segmentation 
problem or a stereo matching problem is of a 2-D grid-like structure. Because a 2-D grid-like 
graph is highly regular in structure, its spanning trees can be easily defined in a systematic way. 

Given an objective function of a 2-D grid-like graphical representation G = (V,E), let M 
be the height of the grid (the number of rows) and N be the width of the grid (the number of 
columns). Let E h be the set of all horizontal edges of G and E v be the set of all vertical edges. 
There are in total M x N nodes, one for each variable. There are in total M(N — 1) horizontal 
edges and N(M — 1) vertical edges. With the notations, the objective function can be expressed 
as 

i&V (i,j)&E 

or equivalently, 

The horizontal path P\ l = (V/ 1 , E^) through a variable node xi consists of all the nodes at the 
same horizontal line as Xi, together with the edges connecting those nodes. The vertical path 
Pi = (Vi i E i) through a variable node X{ consists of all the nodes at the same vertical line as 
Xi, together with the edges connecting those nodes. 

For each variable node Xi, let us define two spanning trees with the node as the root, called 
the horizontal spanning tree T\ l and the vertical spanning tree respectively. The horizontal 
spanning tree consists of the horizontal path through the variable node x, ; and all the vertical 
paths through each node in the horizontal path. The vertical spanning tree T? consists of the 
vertical path through the variable node and all the horizontal paths through each node in the 
vertical path (the illustrations are shown in Fig. [7]). 

Let the functions E^(x), E?(x) be the objective functions associated with the horizontal 
spanning tree T/ 1 and the vertical spanning tree Tf, respectively. Following the general design 
guideline described in the previous section for the spanning-tree based decompositions (see 
Eq. (|24|)). we can define Ef(x) and E?(x) as 

E i (z) = a fi> (a* ) + & hi ' x i ) + c £ hi ^ ' x i) ' 

i'eV (i'j)£E? (i',j)£E v 
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Fig. 7. A horizontal spanning tree (left) and a vertical spanning tree (right) with the variable node xe as their roots. 

E i (x) = a fi' i x i' ) + c Yl hi ' x i ) + b Yl fi'j t x i' ' x ^ ' 
i'ev (i'J)eEv (i'j)eEh 

where a = 1/2MN, b = 1/(MN + N), and c = 1/(MN + M). 

The sub-objective function Ei{x) associated with variable Xj is defined as 

E i (x) = E?(x) + E?{x) . (29) 

Clearly, we have J2i^i( x ) = E(x). 

As mentioned before, any objective function of a tree-like structure can be minimized effi- 
ciently using the dynamic programming technique. It is of a linear computational complexity 
and is simply based on local message passing from the leave nodes all the way back to the root 
node. The books [41], [23] offer a detail explanation about message passing and the dynamic 
programming technique. When applying the technique for minimizing the objective function of 
a horizontal or vertical spanning tree, the message flows among the variable nodes are illustrated 
in Figure [8] 

B. Experimental Results 

The Middlebury College evaluation framework [13] for stereo matching is used in the exper- 
iments. The script used for evaluation is based on exp6_gc.txt offered in the framework. The 
other settings come from the default values in the framework. The results of stereo matching 
algorithms together with the ground truths for the four test stereo image pairs from the evaluation 
framework are shown in Figure [9[ The quality of solutions of both algorithms are very close to 
each other from a visual inspection. 
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Fig. 8. The message flows among the variable nodes when applying the dynamic programming technique for minimizing the 
objective functions associated with the horizontal spanning tree (left) and the vertical spanning tree (right) with the variable 
node x G at their roots. 




Fig. 9. The ground truths (the left column), cooperative optimization (the middle column), and graph cuts (the right column). 
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The following four tables show the performance of the cooperative optimization algorithm 
(upper rows in a table) and the graph cut algorithm (lower rows in a table) over the four test 
image sets. The solution quality is measured in the overall area, no occluded areas, occluded 
areas, textured areas, texture-less areas, and near discontinuity areas (see [13] for the detail 
description of the evaluation framework). Both algorithms does not handle occluded areas (an 
occluded area is one that is visible in one image, but not the other). Also, the runtimes of the two 
algorithms (co = cooperative optimization algorithm, gc = graph cuts) are listed. From the tables 
we can see that the cooperative optimization is very close to graph cuts in terms of solution 
quality and energy states. However, the cooperative optimization algorithm is around 20 times 
faster than graph cuts in the software simulation. 



image = Map 

time: co=17s / gc=337s energy: co=328,658 / gc=321,144 





ALL 


NON OCCL 


OCCL 


TEXTRD 


TEXTRLS 


DJ3ISCNT 


Error 


4.04 


0.85 


16.18 


0.85 


0.36 


2.84 


Bad Pixels 


5.35% 


0.18% 


86.78% 


0.18% 


0.00% 


2.46% 


Error 


3.91 


1.07 


15.45 


1.07 


0.38 


3.65 


Bad Pixels 


5.63% 


0.36% 


88.76% 


0.36% 


0.00% 


4.52% 



image = Sawtooth 

time: co=33s / gc=673s energy: co= 1,430,450 / gc=l,418,015 

ALL NON OCCL OCCL TEXTRD TEXTRLS DJ3ISCNT 

Error 1.46 0.61 7.92 0.63 0.33 1.56 

Bad Pixels 3.93% 1.35% 93.06% 1.48% 0.14% 5.96% 

Error 1.49 0.70 7.88 0.73 0.40 1.60 

Bad Pixels 3.99% 1.38% 94.02% 1.49% 0.31% 6.39% 



image = Tsukuba 

time: co=20s / gc=476s energy: co=517,591 / gc=503,962 

ALL NON OCCL OCCL TEXTRD TEXTRLS DJ3ISCNT 

Error 1.30 0.99 5.41 1.00 0.97 2.01 

Bad Pixels 4.77% 2.59% 87.38% 2.57% 2.61% 10.63% 

Error 1.25 0.92 5.35 1.04 0.73 2.02 

Bad Pixels 4.24% 2.04% 87.60% 2.77% 1.05% 10.00% 
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image = Venus 








time: co= 


=35s / gc= 


=573s 


energy: co 


= 1,253,764/ gc= 


: 1,246,078 




ALL 


NON OCCL 


OCCL 


TEXTRD 


TEXTRLS 


DJ3ISCNT 


Error 


1.58 


1.11 


8.29 


0.93 


1.42 


1.49 


Bad Pixels 


3.29% 


1.65% 


90.72% 


1.38% 


2.20% 


7.28% 


Error 


1.47 


0.95 


8.33 


0.81 


1.18 


1.31 


Bad Pixels 


3.58% 


1.93% 


91.55% 


1.56% 


2.68% 


6.84% 



VIII. Conclusions 

Cooperative optimization offers us a general, distributed optimization method for attacking 
hard optimization problems. Soft decision making, message passing, and solution compromising 
are three important techniques for achieving cooperation among agents in a cooperative optimiza- 
tion system. The global optimality property of consensus solutions offers an appealing reason 
for agents in a system to compromise their solutions so that conflicts in their solutions can be 
resolved. The insights we gained at studying cooperative optimization might help us to apply the 
cooperation principle to understand or solve more generic decision optimization problems arose 
from fields like neurosciences, business management, political management, and social sciences. 

IX. Appendix 

A. Proof of Theorem \4.4\ 

Proof: Although the proof of the theorem is simple and straightforward, the property it 
reveals is important for cooperative optimization. 

Since x is a consensus solution, substitute it into © we have 

^(xi) = (1 - X)E i (x) + X^w^ixj), 

j 

for 1 < i < n. Sum them up, we have 

i i \ j J 

= (l-X)E(x) + Xj2^° o) (^ ■ 

j 

That is 

E(i) = Y,y ( i C ° ) &) ■ (30) 
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For any x, from ([3]), we have 

9t\xi) < (1 - WW + X^w^ixj) 

j 

for 1 < % < n. Sum them up, we have 

y(oo) 



i i \ j 



j \ x i< 



3 

That is 

,(°o) 



E(x)>J2n C °\x i ) . (31) 

i 

Subtract ([30]) from (l3~T1) . we have 



E(x) - E(x) > (vtHxi) - E^ M (^ 



(oo) ^-) I • (32) 



Because 

*| 00> (a: 1 )>*i 00) (5i) 

from (|32|) . we have 



E(x) - > . 

Therefore, x must be the global minimum of E{x). 

This completes the proof. § ■ 

B. Proof of Lemma \4.1\ 

Proof: For any numbers A 1; . . . , in [0, 1), the following inequality can be proved by the 
principle of mathematical induction, 

AiA 2 . . . A* > 1 - (1 — Ai) — (1 — A 2 ) — . . . — (1 — A*) . 

If J2T=i(^ ~ ^k) converges, there exists N such that for all n> N, 

(l-\ N ) + ... + (l-\ n )<^. 



Therefore, defining g(n) as 

^)=ri A 



n 

X k 



k=l 
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we have that for all n> N, 



' /( " ; A V ...A„> 1 id Av) • ... • il • A n )) > l - • 



Therefore, the sequence {g(n)} n > N is a non increasing sequence bounded from below by g(N ■ 
l)/2 > 0. It must have a positive limit e > so that 



lim TT A fc = e > . 

k=l 

Using the inequality 1 — x < e~ x when x £ [0, 1), we have that 

g (n) = f[(l - (1 - A fc )) < e -« 1 -^)+-+( 1 -*-)) . 
fc=i 

If X!fcli(l ~~ ^k) is a divergent series, we have 

n 

lim TT A fc = 

This completes the proof. § 



fc=l 
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